NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Fast Sentence Classification using Word Co-occurrence Graphs

https://doi.org/10.1109/BigData62323.2024.10825869

Mishra, Ashirbad; Kirmani, Shad; Madduri, Kamesh (December 2024, IEEE)

We consider a supervised classification problem of categorizing e-commerce products based on just the words in the title. If done in real-time, the categorization can greatly benefit sellers by enabling them to offer immediate feedback. We present a deterministic algorithm by constructing weighted word co-occurrence graphs from the listing/item titles. We empirically evaluate this algorithm on two publicly available product listing datasets, Etsy and Amazon. Our method’s accuracy is comparable to that of a supervised classifier constructed using the fastText library. The inference time of our model is up to 2.9× faster than the fastText classifier and has small training times. The training and inference of our model scales well for big datasets performing large-scale classification on millions of listings. We perform a detailed analysis and provide insights into our method and the product categorization task.
more » « less
Full Text Available
Jet: Multilevel Graph Partitioning on Graphics Processing Units

https://doi.org/10.1137/23M1559129

Gilbert, Michael S; Madduri, Kamesh; Boman, Erik G; Rajamanickam, Siva (October 2024, SIAM Journal on Scientific Computing)

Full Text Available
Efficient community detection in multilayer networks using boolean compositions

https://doi.org/10.3389/fdata.2023.1144793

Santra, Abhishek; Irany, Fariba Afrin; Madduri, Kamesh; Chakravarthy, Sharma; Bhowmick, Sanjukta (August 2023, Frontiers in Big Data)

Networks (or graphs) are used to model the dyadic relations between entities in complex systems. Analyzing the properties of the networks reveal important characteristics of the underlying system. However, in many disciplines, including social sciences, bioinformatics, and technological systems, multiple relations exist between entities. In such cases, a simple graph is not sufficient to model these multiple relations, and a multilayer network is a more appropriate model. In this paper, we explore community detection in multilayer networks. Specifically, we propose a novel network decoupling strategy for efficiently combining the communities in the different layers using the Boolean primitives AND, OR, and NOT. Our proposed method, network decoupling, is based on analyzing the communities in each network layer individually and then aggregating the analysis results. We (i) describe our network decoupling algorithms for finding communities, (ii) present how network decoupling can be used to express different types of communities in multilayer networks, and (iii) demonstrate the effectiveness of using network decoupling for detecting communities in real-world and synthetic data sets. Compared to other algorithms for detecting communities in multilayer networks, our proposed network decoupling method requires significantly lower computation time while producing results of high accuracy. Based on these results, we anticipate that our proposed network decoupling technique will enable a more detailed analysis of multilayer networks in an efficient manner.
more » « less
Full Text Available
Performance-Portable Graph Coarsening for Efficient Multilevel Graph Analysis

https://doi.org/10.1109/IPDPS49936.2021.00030

Gilbert, Michael S.; Acer, Seher; Boman, Erik G.; Madduri, Kamesh; Rajamanickam, Sivasankaran (May 2021, 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS))
null (Ed.)
The multilevel heuristic is an effective strategy for speeding up graph analytics, and graph coarsening is an integral step of multilevel methods. We perform a comprehensive study of multilevel coarsening in this work. We primarily focus on the graphics processing unit (GPU) parallelization of the Heavy Edge Coarsening (HEC) method executed in an iterative setting. We present optimizations for the two phases of coarsening, a fine-to-coarse vertex mapping phase, and a coarse graph construction phase. We also express several other coarsening algorithms using the Kokkos framework and discuss their parallelization. We demonstrate the efficacy of parallelized HEC on an NVIDIA Turing GPU and a 32-core AMD Ryzen processor using multilevel spectral graph partitioning as the primary case study.
more » « less
Full Text Available
Optimizing Word2Vec Performance on Multicore Systems

https://doi.org/10.1145/3149704.3149768

Rengasamy, Vasudevan; Fu, Tao-Yang; Lee, Wang-Chien; Madduri, Kamesh (November 2017, Proceedings of the Seventh Workshop on Irregular Applications: Architectures and Algorithms)

The Skip-gram with negative sampling (SGNS) method of Word2Vec is an unsupervised approach to map words in a text corpus to low dimensional real vectors. The learned vectors capture semantic relationships between co-occurring words and can be used as inputs to many natural language processing and machine learning tasks. There are several high-performance implementations of the Word2Vec SGNS method. In this paper, we introduce a new optimization called context combining to further boost SGNS performance on multicore systems. For processing the One Billion Word benchmark dataset on a 16-core platform, we show that our approach is 3.53x faster than the original multithreaded Word2Vec implementation and 1.28x faster than a recent parallel Word2Vec implementation. We also show that our accuracy on benchmark queries is comparable to state-of-the-art implementations.
more » « less
Full Text Available

Search for: All records